Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus
نویسندگان
چکیده
Department of Electrical Engineering and Computer Science Graduate School of Engineering Nagoya University Chikusa-ku, Nagoya, 464-8603, JAPAN [email protected], {matuyosi,kondoh}@sslab.nuee.nagoya-u.ac.jp Abstract This paper describes a method of readability measurement of Japanese texts based on a newly compiled textbook corpus. The textbook corpus consists of 1,478 sample passages extracted from 127 textbooks of elementary school, junior high school, high school, and university; it is divided into thirteen grade levels and the total size is about a million characters. For a given text passage, the readability measurement method determines the grade level to which the passage is the most similar by using character-unigram models, which are constructed from the textbook corpus. Because this method does not require sentence-boundary analysis and word-boundary analysis, it is applicable to texts that include incomplete sentences and non-regular text fragments. The performance of this method, which is measured by the correlation coefficient, is considerably high (R > 0.9); in case that the length of a text passage is limited in 25 characters, the correlation coefficient is still high (R = 0.83).
منابع مشابه
Measuring Readability for Japanese Learners of English
This paper describes the relative effectiveness of seven variables of three categories in predicting the readability of the EFL texts used in the Japanese context. The factors examined in our research were (1) word difficulty and (2) idiom difficulty, in addition to the commonly used variables, sentence length (SL) and word length (WL). In the analysis of word difficulty three measures were con...
متن کاملCognitively Motivated Features for Readability Assessment
We investigate linguistic features that correlate with the readability of texts for adults with intellectual disabilities (ID). Based on a corpus of texts (including some experimentally measured for comprehension by adults with ID), we analyze the significance of novel discourselevel features related to the cognitive factors underlying our users’ literacy challenges. We develop and evaluate a t...
متن کاملQualitative and Quantitative Examination of Text Type Readabilities: A Comparative Analysis
This study compared 2 main approaches to readability assessment. Thequantitative approach applied idea density based on part of speech tagging andcompared 3 sets of text types (i.e., narrative, expository, and argumentative) withrespect to their ease of reading. The qualitative approach was done throughdeveloping questionnaires measuring intermediate EFL learners’ perceptions oncontent, motivat...
متن کاملAn analysis of a French as a Foreign Language Corpus for Readability Assessment
Readability aims to assess the difficulty of texts based on various linguistic predictors (the lexicon used, the complexity of sentences, the coherence of the text, etc.). It is an active field that has applications in a large number of NLP domains, among which machine translation, text simplification, text summarisation, or CALL (Computer-Assisted Language Learning). For CALL, readability tool...
متن کاملAutomatic Speech Transcription and Archiving System using the Corpus of Spontaneous Japanese
The target of automatic speech recognition (ASR) research has been shifted from read speech to spontaneous speech. The technology will realize automatic transcription (and translation) of lectures and meetings. In Japan, ”Spontaneous Speech” project has been conducted in last five years, and we set up the huge ”Corpus of Spontaneous Japanese (CSJ)”, which consists of over 2000 speeches (500 hou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008